AITopics

Country: North America > United States (0.28)

Genre: Research Report > Experimental Study (1.00)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (1.00)

Neural Information Processing SystemsJun-10-2026, 03:08:18 GMT

Optimistic Query Routing in Clustering-based Approximate Maximum Inner Product Search

Clustering-based nearest neighbor search algorithms partition points into shards to form an index, and search only a subset of shards to process a query. Even though search efficacy is heavily influenced by the algorithm that identifies the shards to probe, it has received little attention in the literature. We study routing in clustering-based maximum inner product search, which includes cosine similarity search. We unpack existing routers and notice the surprising role of optimism. We then take a page from the sequential decision making literature and formalize that insight following the principle of ``optimism in the face of uncertainty.'' In particular, we present a framework that incorporates the moments of the distribution of inner products within each shard to estimate the maximum inner product. We then develop a practical instance of our algorithm that uses only the first two moments to reach the same accuracy as state-of-the-art routers by probing up to $50\%$ fewer points on benchmark datasets without compromising efficiency. Our algorithm is also space-efficient: we design a sketch of the second moment whose size is independent of the number of points and requires $\mathcal{O}(1)$ vectors per shard.

artificial intelligence, natural language, proceedings, (7 more...)

Technology:

Information Technology > Artificial Intelligence > Representation & Reasoning (0.97)
Information Technology > Information Management > Search (0.63)
Information Technology > Artificial Intelligence > Natural Language (0.60)

Galley, Jakob, Shahverdi, Vahid, Flinth, Axel

Conservation Laws from Data Symmetry in Neural Networks

arXiv.org Machine LearningJun-10-2026

We explore whether intrinsic symmetries of the training data lead to conserved quantities during gradient-flow training of neural networks. Under the assumption that the loss function is analytic and non-polynomial, we prove that data symmetries generically do not induce any additional integrals of motion. For mean squared error (MSE) loss, on the other hand, there are situations in which data augmentation yields extra conserved quantities. We build a framework, utilizing tensorizable networks to describe this phenomenon. Tensorizable networks are a family of architectures whose dependence on parameters and inputs can be separated using an intermediate representation. They include linear and Figure 1: A display of how data symmetry can give polynomial networks, as well as Lightning At-rise to conservation laws. The top row shows the tention.

artificial intelligence, machine learning, symmetry, (18 more...)

2606.10913

Country: Europe (0.28)

Genre: Research Report (0.64)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)

Yamauchi, Shogo, Nitta, Tohru, Tamori, Hideaki

Quaternion Self-Attention with Shared Scores

arXiv.org Machine LearningMay-26-2026

Quaternion neural networks are parameter-efficient and model multidimensional dependencies by representing four related features as a single entity. However, existing quaternion self-attention computes component-wise scores and applies independent softmax operations to each component, which increases the computational cost and allows attention distributions to diverge across components. We propose a shared-score quaternion self-attention mechanism that computes a single real-valued score using the quaternion inner product and applies a shared attention distribution across all components. This reduces score-computation multiplications by 75% and the number of softmax operations from four to one. We prove that, when queries and keys are produced by quaternion linear projections that induce component pre-mixing, the component-wise and shared scores lie in the same interaction subspace, indicating that independent component-wise attention primarily re-parameterizes the same interactions rather than expanding the feature interaction space. In speech enhancement, our method reduces inference time by up to 44.3% on a GPU and 58.1% on a CPU while maintaining quality, with consistent trends across vision and natural language processing.

artificial intelligence, machine learning, natural language, (18 more...)

2605.2492

Country:

Asia > Japan (0.28)
North America > United States (0.28)

Genre: Research Report > New Finding (1.00)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Neural Networks (1.00)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.88)

Neural Information Processing SystemsApr-29-2026, 20:22:51 GMT

d066d21c619d0a78c5b557fa3291a8f4-Paper-Conference.pdf

artificial intelligence, machine learning, natural language, (19 more...)

Country:

North America > United States (0.28)
Europe > United Kingdom > England (0.28)

Genre: Research Report > New Finding (0.46)

Technology:

Information Technology > Artificial Intelligence > Natural Language (1.00)
Information Technology > Artificial Intelligence > Representation & Reasoning (0.68)
Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Regression (0.46)
Information Technology > Artificial Intelligence > Machine Learning > Performance Analysis > Accuracy (0.46)

Shashank Singh, Simon S. Du, Barnabas Poczos

Efficient Nonparametric Smoothness Estimation

Neural Information Processing SystemsApr-21-2026, 17:59:00 GMT

Neural Information Processing Systems http://nips.cc/

artificial intelligence, estimator, machine learning, (14 more...)

Country:

North America > United States (0.28)
Europe (0.28)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning (0.68)

Soto-Larrosa, Víctor, Torrado, Nuria, Huertas, Edmundo J.

Structural interpretability in SVMs with truncated orthogonal polynomial kernels

arXiv.org Machine LearningApr-17-2026

We study post-training interpretability for Support Vector Machines (SVMs) built from truncated orthogonal polynomial kernels. Since the associated reproducing kernel Hilbert space is finite-dimensional and admits an explicit tensor-product orthonormal basis, the fitted decision function can be expanded exactly in intrinsic RKHS coordinates. This leads to Orthogonal Representation Contribution Analysis (ORCA), a diagnostic framework based on normalized Orthogonal Kernel Contribution (OKC) indices. These indices quantify how the squared RKHS norm of the classifier is distributed across interaction orders, total polynomial degrees, marginal coordinate effects, and pairwise contributions. The methodology is fully post-training and requires neither surrogate models nor retraining. We illustrate its diagnostic value on a synthetic double-spiral problem and on a real five-dimensional echocardiogram dataset. The results show that the proposed indices reveal structural aspects of model complexity that are not captured by predictive accuracy alone.

artificial intelligence, machine learning, okc, (18 more...)

2604.15285

Country:

Europe > Spain > Galicia > Madrid (0.05)
Europe > United Kingdom > England > Cambridgeshire > Cambridge (0.04)

Genre: Research Report (1.00)

Industry: Health & Medicine > Therapeutic Area > Cardiology/Vascular Diseases (0.34)

Technology: Information Technology > Artificial Intelligence > Machine Learning > Statistical Learning > Support Vector Machines (0.69)

arXiv.org Machine LearningApr-14-2026

ADD for Multi-Bit Image Watermarking

Luo, An, Ding, Jie

As generative models enable rapid creation of high-fidelity images, societal concerns about misinformation and authenticity have intensified. A promising remedy is multi-bit image watermarking, which embeds a multi-bit message into an image so that a verifier can later detect whether the image is generated by someone and further identify the source by decoding the embedded message. Existing approaches often fall short in capacity, resilience to common image distortions, and theoretical justification. To address these limitations, we propose ADD (Add, Dot, Decode), a multi-bit image watermarking method with two stages: learning a watermark to be linearly combined with the multi-bit message and added to the image, and decoding through inner products between the watermarked image and the learned watermark. On the standard MS-COCO benchmark, we demonstrate that for the challenging task of 48-bit watermarking, ADD achieves 100\% decoding accuracy, with performance dropping by at most 2\% under a wide range of image distortions, substantially smaller than the 14\% average drop of state-of-the-art methods. In addition, ADD achieves substantial computational gains, with 2-fold faster embedding and 7.4-fold faster decoding than the fastest existing method. We further provide a theoretical analysis explaining why the learned watermark and the corresponding decoding rule are effective.

large language model, machine learning, natural language, (21 more...)

2604.11491

Country:

North America > Canada > Ontario > Toronto (0.14)
North America > United States > Minnesota (0.04)
Europe > Netherlands (0.04)

Genre: Research Report (1.00)

Industry: Information Technology > Security & Privacy (1.00)

Technology:

Information Technology > Sensing and Signal Processing > Image Processing (1.00)
Information Technology > Security & Privacy (1.00)
Information Technology > Artificial Intelligence > Vision (1.00)
(3 more...)

Reverdi, Justin, Zhang, Sixin, Gamboa, Fabrice, Gratton, Serge

Lipschitz bounds for integral kernels

arXiv.org Machine LearningApr-6-2026

Feature maps associated with positive definite kernels play a central role in kernel methods and learning theory, where regularity properties such as Lipschitz continuity are closely related to robustness and stability guarantees. Despite their importance, explicit characterizations of the Lipschitz constant of kernel feature maps are available only in a limited number of cases. In this paper, we study the Lipschitz regularity of feature maps associated with integral kernels under differentiability assumptions. We first provide sufficient conditions ensuring Lipschitz continuity and derive explicit formulas for the corresponding Lipschitz constants. We then identify a condition under which the feature map fails to be Lipschitz continuous and apply these results to several important classes of kernels. For infinite width two-layer neural network with isotropic Gaussian weight distributions, we show that the Lipschitz constant of the associated kernel can be expressed as the supremum of a two-dimensional integral, leading to an explicit characterization for the Gaussian kernel and the ReLU random neural network kernel. We also study continuous and shift-invariant kernels such as Gaussian, Laplace, and Matérn kernels, which admit an interpretation as neural network with cosine activation function. In this setting, we prove that the feature map is Lipschitz continuous if and only if the weight distribution has a finite second-order moment, and we then derive its Lipschitz constant. Finally, we raise an open question concerning the asymptotic behavior of the convergence of the Lipschitz constant in finite width neural networks. Numerical experiments are provided to support this behavior.

artificial intelligence, lipschitz constant, machine learning, (15 more...)